Startup speed, event filtering, log hygiene & reliability#272
Startup speed, event filtering, log hygiene & reliability#272
Conversation
- Parallelize persona initialization: batched 6-concurrent (18-36s → 3-6s) - Parallelize PersonaUser.initialize(): genome sync, rate limiter, corpus load run concurrently - EventsDaemonBrowser: filter-first DOM dispatch (only events with registered interest) - Events.subscribe(): registers DOM interest automatically in browser - BaseWidget/BaseContentWidget: register DOM interest for event dispatchers - WidgetEventServiceBrowser: track and cleanup document listeners on disconnect - ResponseCorrelator: periodic cleanup timer (60s sweep) + destroy() on disconnect - JTAGClient.disconnect(): calls correlator.destroy() to stop orphaned timers
- UserEntityCache: in-memory TTL cache for UserEntity reads, 60s expiry - CallerDetector: use cached reads instead of raw ORM.read per command - SessionDaemonServer: getUserById uses cached UserEntity reads - UserDaemonServer: monitoring loop 5s→30s, populates cache on query - Combined with Round 1 (LogBatcher, EventRateLimiter, ORM race guards), total errors eliminated: ~2,200 → 0 per session
AdapterHealthMonitor: per-adapter backoff (base × 2^failures, 5min ceiling) so unhealthy providers don't spam checks every 30s but still recover. BaseOpenAICompatibleAdapter: throttle insufficient_funds/rate_limited logs to once per 5 minutes per status instead of every API call.
- Remove 45+ debug console.error calls left from session identity investigation - Raise ORM SLOW thresholds 50ms/100ms → 1000ms (5689 → 192 lines) - Exclude fire-and-forget log/write-batch from TimingHarness slow logging - Remove per-init console.logs: DatabaseHandleRegistry, ResourceManager, AIAudioInjector TTS subscribe, AIGenerateServerCommand - Steady-state log growth: continuous → 0 lines/30s
Every entity update event was logging two console.log calls per entity, flooding the browser console with update notifications.
There was a problem hiding this comment.
Pull request overview
This PR targets faster startup and improved runtime reliability by reducing IPC/log overhead, caching high-frequency ORM reads, and filtering/cleaning noisy browser/server events and logs.
Changes:
- Added Rust/TS log batching (
log/write-batch+LogBatcher) to reduce logger IPC load and contention. - Introduced an in-memory TTL
UserEntityCacheand pre-warming to reduce redundant ORM reads/timeouts during startup. - Implemented DOM “interest-based” event dispatch filtering in the browser daemon, plus assorted log hygiene, backoff/throttling, and lifecycle cleanup fixes.
Reviewed changes
Copilot reviewed 35 out of 36 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/workers/continuum-core/src/modules/logger.rs | Adds log/write-batch payload/result types and handler in Rust logger worker. |
| src/widgets/shared/EntityScroller.ts | Removes noisy console logging for entity updates. |
| src/widgets/shared/BaseWidget.ts | Registers DOM interest for widget events before attaching document listeners. |
| src/widgets/shared/BaseContentWidget.ts | Registers DOM interest for content events and attaches document listeners. |
| src/widgets/browser/services/WidgetEventServiceBrowser.ts | Tracks/removes DOM listeners; registers/unregisters DOM interest for events. |
| src/system/voice/server/AIAudioInjector.ts | Removes prototype console logging. |
| src/system/user/server/modules/PersonaTaskExecutor.ts | Adds ORM update race guard for task deletion during execution. |
| src/system/user/server/modules/PersonaAutonomousLoop.ts | Adds ORM update race guard for task deletion before execution. |
| src/system/user/server/UserEntityCache.ts | New TTL cache for read-through UserEntity reads + bulk prewarm API. |
| src/system/user/server/PersonaUser.ts | Parallelizes Rust bridge syncing and corpus load during persona init. |
| src/system/user/server/CallerDetector.ts | Switches user reads to UserEntityCache. |
| src/system/resources/shared/ResourceManager.ts | Removes non-actionable console logs. |
| src/system/core/shared/TimingHarness.ts | Suppresses slow-operation console logs for log/write-batch. |
| src/system/core/shared/ResponseCorrelator.ts | Adds periodic cleanup timer + destroy() for lifecycle shutdown. |
| src/system/core/shared/Events.ts | Tracks DOM interest for browser subscriptions (used for browser-side dispatch filtering). |
| src/system/core/logging/Logger.ts | Instantiates/destroys LogBatcher; hides direct worker client behind batcher. |
| src/system/core/logging/LogBatcher.ts | New batching layer for sending many log writes in one IPC request. |
| src/system/core/logging/ComponentLogger.ts | Routes Rust logging through LogBatcher instead of per-message IPC. |
| src/system/core/client/shared/JTAGClient.ts | Ensures ResponseCorrelator.destroy() is called on shutdown. |
| src/system/core/client/server/JTAGClientServer.ts | Removes noisy console logs; clarifies “no local system auto-create” behavior. |
| src/shared/version.ts | Version bump. |
| src/shared/ipc/logger/LoggerWorkerClient.ts | Adds writeLogsBatch() that sends log/write-batch. |
| src/shared/ipc/logger/LoggerMessageTypes.ts | Adds batch payload/result TS types. |
| src/package.json | Version bump. |
| src/package-lock.json | Version bump. |
| src/generated-command-schemas.json | Regeneration timestamp update. |
| src/daemons/user-daemon/server/UserDaemonServer.ts | Batches persona initialization; slows monitoring interval; prewarms UserEntityCache. |
| src/daemons/session-daemon/server/SessionDaemonServer.ts | Switches user reads to UserEntityCache; removes debug spam. |
| src/daemons/events-daemon/shared/EventsDaemon.ts | Adjusts event rate limiter thresholds + log formatting. |
| src/daemons/events-daemon/browser/EventsDaemonBrowser.ts | Adds DOM interest-based filtering before dispatching DOM CustomEvents. |
| src/daemons/data-daemon/shared/ORMLogger.ts | Raises “slow ORM” warning threshold to reduce noise. |
| src/daemons/data-daemon/server/ORMRustClient.ts | Raises “slow IPC” warning threshold to reduce noise. |
| src/daemons/data-daemon/server/DatabaseHandleRegistry.ts | Removes console logs during handle registry initialization/registration. |
| src/daemons/ai-provider-daemon/shared/adapters/BaseOpenAICompatibleAdapter.ts | Throttles repeated per-status error logs (e.g., insufficient funds). |
| src/daemons/ai-provider-daemon/server/AdapterHealthMonitor.ts | Adds exponential backoff and milestone logging for failing adapter health checks. |
| src/commands/ai/generate/server/AIGenerateServerCommand.ts | Removes debug console logs; returns result directly. |
Files not reviewed (1)
- src/package-lock.json: Language not supported
Comments suppressed due to low confidence (2)
src/widgets/shared/BaseContentWidget.ts:303
- setupContentEventListeners() registers DOM interest and attaches three document-level listeners, but there is no corresponding removal/unregister in onWidgetCleanup(). Since BaseContentWidget instances can be connected/disconnected, this will leak listeners and keep DOM interest permanently enabled for these events. Track listener references + unregister functions and dispose them during cleanup.
src/widgets/shared/BaseWidget.ts:801 - setupEventDispatcher() registers DOM interest but drops the returned unregister function, and adds a document.addEventListener with an inline callback that is never removed in disconnectedCallback(). This defeats the "interest-based dispatch" goal over time (interest leaks) and can leak listeners across widget lifecycles. Store both the unregister closure and the event listener function (or use an AbortController) and clean them up in disconnectedCallback()/onWidgetCleanup.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } catch { | ||
| // Task was deleted between dequeue and completion — work was still done, just can't record it | ||
| this.log(`⚠️ ${this.displayName}: Task ${task.taskId.slice(0, 8)} vanished during execution (deleted externally?)`); |
There was a problem hiding this comment.
This catch swallows all ORM.update failures and assumes the task was deleted. ORM.update can throw for other reasons (IPC timeout, Rust error, read-after-update failure), so this can hide real persistence failures and make debugging harder. Capture the error and only suppress known "not found"/deleted-task cases; otherwise log (or rethrow) unexpected errors.
| } catch { | |
| // Task was deleted between dequeue and completion — work was still done, just can't record it | |
| this.log(`⚠️ ${this.displayName}: Task ${task.taskId.slice(0, 8)} vanished during execution (deleted externally?)`); | |
| } catch (error) { | |
| const errorMessage = | |
| error && typeof error === 'object' && 'message' in error && typeof (error as any).message === 'string' | |
| ? (error as any).message | |
| : String(error); | |
| const errorCode = | |
| error && typeof error === 'object' && 'code' in error ? (error as any).code : undefined; | |
| const looksLikeNotFound = | |
| (typeof errorMessage === 'string' && | |
| /not\s*found/i.test(errorMessage)) || | |
| (typeof errorMessage === 'string' && | |
| /No document with id/i.test(errorMessage)) || | |
| errorCode === 'NOT_FOUND'; | |
| if (looksLikeNotFound) { | |
| // Task was deleted between dequeue and completion — work was still done, just can't record it | |
| this.log( | |
| `⚠️ ${this.displayName}: Task ${task.taskId.slice( | |
| 0, | |
| 8 | |
| )} vanished during execution (deleted externally? ORM.update not-found: ${errorMessage})` | |
| ); | |
| } else { | |
| // Unexpected persistence failure — log full details to aid debugging | |
| this.log( | |
| `❌ ${this.displayName}: Failed to update task ${task.taskId.slice( | |
| 0, | |
| 8 | |
| )} in ORM.update: ${errorMessage}` | |
| ); | |
| } |
| } catch { | ||
| // Task was deleted between dequeue and execution — skip it | ||
| this.log(`⚠️ ${this.personaUser.displayName}: Task ${item.taskId.slice(0, 8)} vanished before execution`); | ||
| return; |
There was a problem hiding this comment.
This catch treats any ORM.update failure as "task vanished" and returns. ORM.update can fail for reasons other than deletion, so this could cause tasks to be silently skipped during transient DB/IPC issues. Capture and inspect/log the error; only skip when the error indicates a missing/deleted task, and consider retry/backoff for transient failures.
| } catch { | |
| // Task was deleted between dequeue and execution — skip it | |
| this.log(`⚠️ ${this.personaUser.displayName}: Task ${item.taskId.slice(0, 8)} vanished before execution`); | |
| return; | |
| } catch (error: any) { | |
| const message = error?.message ?? String(error); | |
| const isNotFoundError = | |
| error?.code === 'NOT_FOUND' || | |
| error?.name === 'NotFoundError' || | |
| /not\s*found|missing|deleted/i.test(message); | |
| if (isNotFoundError) { | |
| // Task was deleted between dequeue and execution — skip it | |
| this.log( | |
| `⚠️ ${this.personaUser.displayName}: Task ${item.taskId.slice(0, 8)} vanished before execution (${message})` | |
| ); | |
| return; | |
| } | |
| // Non-deletion failure — log, retry once, and propagate on persistent failure | |
| this.log( | |
| `⚠️ ${this.personaUser.displayName}: Failed to mark task ${item.taskId.slice( | |
| 0, | |
| 8 | |
| )} as in_progress: ${message}` | |
| ); | |
| try { | |
| await ORM.update<TaskEntity>( | |
| COLLECTIONS.TASKS, | |
| item.taskId, | |
| { status: 'in_progress', startedAt: new Date() } | |
| ); | |
| } catch (retryError: any) { | |
| const retryMessage = retryError?.message ?? String(retryError); | |
| this.log( | |
| `❌ ${this.personaUser.displayName}: Retry to mark task ${item.taskId.slice( | |
| 0, | |
| 8 | |
| )} as in_progress failed: ${retryMessage}` | |
| ); | |
| throw retryError; | |
| } |
| constructor(defaultTimeoutMs: number = 30000) { | ||
| this.defaultTimeoutMs = defaultTimeoutMs; | ||
| // Periodic sweep for orphaned requests (every 60s) | ||
| this.cleanupTimer = setInterval(() => this.cleanup(), 60_000); |
There was a problem hiding this comment.
ResponseCorrelator starts a setInterval in the constructor. In Node, this timer can keep the process alive if shutdown() is not called (e.g., short-lived scripts/tests). Consider calling unref() on the interval handle when available, or lazily creating the interval only when there are pending requests and clearing it when the map becomes empty.
| this.cleanupTimer = setInterval(() => this.cleanup(), 60_000); | |
| this.cleanupTimer = setInterval(() => this.cleanup(), 60_000); | |
| // In Node.js, prevent this interval from keeping the process alive on its own | |
| (this.cleanupTimer as any)?.unref?.(); |
| let count = batch.entries.len(); | ||
| for entry in batch.entries { | ||
| // Queue each entry through the existing channel (writer thread handles actual I/O) | ||
| let _ = self.log_tx.try_send(entry); | ||
| } |
There was a problem hiding this comment.
handle_write_batch() returns entriesQueued as the batch length, but the loop uses try_send() and ignores send failures. If the channel is full, some entries will be dropped while the response still reports them as queued. Track try_send() successes (and optionally dropped count) so the result reflects reality and you can observe backpressure/drops.
| fn handle_write_batch(&self, params: Value) -> Result<CommandResult, String> { | ||
| // Extract payload (WorkerClient nests under "payload", support both patterns) | ||
| let payload_value = if let Some(nested) = params.get("payload") { | ||
| nested.clone() | ||
| } else { | ||
| params.clone() | ||
| }; | ||
|
|
||
| let batch: WriteLogBatchPayload = | ||
| serde_json::from_value(payload_value).map_err(|e| format!("Invalid batch payload: {e}"))?; | ||
|
|
||
| let count = batch.entries.len(); | ||
| for entry in batch.entries { | ||
| // Queue each entry through the existing channel (writer thread handles actual I/O) | ||
| let _ = self.log_tx.try_send(entry); | ||
| } | ||
|
|
||
| self.requests_processed.fetch_add(1, Ordering::Relaxed); | ||
|
|
||
| CommandResult::json(&WriteLogBatchResult { | ||
| entries_queued: count, | ||
| }) | ||
| } |
There was a problem hiding this comment.
New log/write-batch behavior is not covered by tests. Since this file already has unit tests for log/ping and log/write, add a test that calls handle_command("log/write-batch", ...) and asserts it succeeds and returns a payload with entriesQueued matching the number actually accepted.
| document.addEventListener(patternOrEventName, eventHandler); | ||
|
|
||
| // Register DOM interest so EventsDaemonBrowser dispatches this event to DOM | ||
| this.domInterest.add(patternOrEventName); | ||
|
|
There was a problem hiding this comment.
Events.domInterest is a Set, and the unsubscribe closure always deletes the event name. If multiple callers subscribe to the same DOM event, the first unsubscribe will delete the interest and can stop EventsDaemonBrowser from dispatching that event to the DOM even though other listeners are still registered. Use a reference count (e.g., Map<string, number>) or another mechanism that only removes interest when the last listener unsubscribes.
| public static registerDOMInterest(eventName: string): () => void { | ||
| EventsDaemonBrowser._domInterest.add(eventName); | ||
| return () => { | ||
| EventsDaemonBrowser._domInterest.delete(eventName); | ||
| }; |
There was a problem hiding this comment.
registerDOMInterest() uses a Set and the returned unregister function unconditionally deletes the event name. If multiple widgets/services register the same event, the first cleanup will delete interest for all and can prevent DOM dispatch for remaining listeners. Consider ref-counting interests (Map<string, number>) so unregister only removes when the last registrant cleans up.
Summary
Metrics (before → after)
Test plan